12 research outputs found
Confident Object Detection via Conformal Prediction and Conformal Risk Control: an Application to Railway Signaling
Deploying deep learning models in real-world certified systems requires the
ability to provide confidence estimates that accurately reflect their
uncertainty. In this paper, we demonstrate the use of the conformal prediction
framework to construct reliable and trustworthy predictors for detecting
railway signals. Our approach is based on a novel dataset that includes images
taken from the perspective of a train operator and state-of-the-art object
detectors. We test several conformal approaches and introduce a new method
based on conformal risk control. Our findings demonstrate the potential of the
conformal prediction framework to evaluate model performance and provide
practical guidance for achieving formally guaranteed uncertainty bounds
Gradient strikes back: How filtering out high frequencies improves explanations
Recent years have witnessed an explosion in the development of novel
prediction-based attribution methods, which have slowly been supplanting older
gradient-based methods to explain the decisions of deep neural networks.
However, it is still not clear why prediction-based methods outperform
gradient-based ones. Here, we start with an empirical observation: these two
approaches yield attribution maps with very different power spectra, with
gradient-based methods revealing more high-frequency content than
prediction-based methods. This observation raises multiple questions: What is
the source of this high-frequency information, and does it truly reflect
decisions made by the system? Lastly, why would the absence of high-frequency
information in prediction-based methods yield better explainability scores
along multiple metrics? We analyze the gradient of three representative visual
classification models and observe that it contains noisy information emanating
from high-frequencies. Furthermore, our analysis reveals that the operations
used in Convolutional Neural Networks (CNNs) for downsampling appear to be a
significant source of this high-frequency content -- suggesting aliasing as a
possible underlying basis. We then apply an optimal low-pass filter for
attribution maps and demonstrate that it improves gradient-based attribution
methods. We show that (i) removing high-frequency noise yields significant
improvements in the explainability scores obtained with gradient-based methods
across multiple models -- leading to (ii) a novel ranking of state-of-the-art
methods with gradient-based methods at the top. We believe that our results
will spur renewed interest in simpler and computationally more efficient
gradient-based methods for explainability
Towards Understanding the Mechanism of Contrastive Learning via Similarity Structure: A Theoretical Analysis
Contrastive learning is an efficient approach to self-supervised
representation learning. Although recent studies have made progress in the
theoretical understanding of contrastive learning, the investigation of how to
characterize the clusters of the learned representations is still limited. In
this paper, we aim to elucidate the characterization from theoretical
perspectives. To this end, we consider a kernel-based contrastive learning
framework termed Kernel Contrastive Learning (KCL), where kernel functions play
an important role when applying our theoretical results to other frameworks. We
introduce a formulation of the similarity structure of learned representations
by utilizing a statistical dependency viewpoint. We investigate the theoretical
properties of the kernel-based contrastive loss via this formulation. We first
prove that the formulation characterizes the structure of representations
learned with the kernel-based contrastive learning framework. We show a new
upper bound of the classification error of a downstream task, which explains
that our theory is consistent with the empirical success of contrastive
learning. We also establish a generalization error bound of KCL. Finally, we
show a guarantee for the generalization ability of KCL to the downstream
classification task via a surrogate bound
Learning Domain Invariant Representations by Joint Wasserstein Distance Minimization
Domain shifts in the training data are common in practical applications of
machine learning, they occur for instance when the data is coming from
different sources. Ideally, a ML model should work well independently of these
shifts, for example, by learning a domain-invariant representation. Moreover,
privacy concerns regarding the source also require a domain-invariant
representation. In this work, we provide theoretical results that link domain
invariant representations -- measured by the Wasserstein distance on the joint
distributions -- to a practical semi-supervised learning objective based on a
cross-entropy classifier and a novel domain critic. Quantitative experiments
demonstrate that the proposed approach is indeed able to practically learn such
an invariant representation (between two domains), and the latter also supports
models with higher predictive accuracy on both domains, comparing favorably to
existing techniques.Comment: 20 pages including appendix. Under Revie
A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation
In recent years, concept-based approaches have emerged as some of the most
promising explainability methods to help us interpret the decisions of
Artificial Neural Networks (ANNs). These methods seek to discover intelligible
visual 'concepts' buried within the complex patterns of ANN activations in two
key steps: (1) concept extraction followed by (2) importance estimation. While
these two steps are shared across methods, they all differ in their specific
implementations. Here, we introduce a unifying theoretical framework that
comprehensively defines and clarifies these two steps. This framework offers
several advantages as it allows us: (i) to propose new evaluation metrics for
comparing different concept extraction approaches; (ii) to leverage modern
attribution methods and evaluation metrics to extend and systematically
evaluate state-of-the-art concept-based approaches and importance estimation
techniques; (iii) to derive theoretical guarantees regarding the optimality of
such methods. We further leverage our framework to try to tackle a crucial
question in explainability: how to efficiently identify clusters of data points
that are classified based on a similar shared strategy. To illustrate these
findings and to highlight the main strategies of a model, we introduce a visual
representation called the strategic cluster graph. Finally, we present
https://serre-lab.github.io/Lens, a dedicated website that offers a complete
compilation of these visualizations for all classes of the ImageNet dataset
Conformal Prediction for Trustworthy Detection of Railway Signals
We present an application of conformal prediction, a form of uncertainty quantification with guarantees, to the detection of railway signals. State-of-the-art architectures are tested and the most promising one undergoes the process of conformalization, where a correction is applied to the predicted bounding boxes (i.e. to their height and width) such that they comply with a predefined probability of success. We work with a novel exploratory dataset of images taken from the perspective of a train operator, as a first step to build and validate future trustworthy machine learning models for the detection of railway signals
Confident Object Detection via Conformal Prediction and Conformal Risk Control: an Application to Railway Signaling
Deploying deep learning models in real-world certified systems requires the ability to provide confidence estimates that accurately reflect their uncertainty. In this paper, we demonstrate the use of the conformal prediction framework to construct reliable and trustworthy predictors for detecting railway signals. Our approach is based on a novel dataset that includes images taken from the perspective of a train operator and state-of-the-art object detectors. We test several conformal approaches and introduce a new method based on conformal risk control. Our findings demonstrate the potential of the conformal prediction framework to evaluate model performance and provide practical guidance for achieving formally guaranteed uncertainty bounds
Confident Object Detection via Conformal Prediction and Conformal Risk Control: an Application to Railway Signaling
Deploying deep learning models in real-world certified systems requires the ability to provide confidence estimates that accurately reflect their uncertainty. In this paper, we demonstrate the use of the conformal prediction framework to construct reliable and trustworthy predictors for detecting railway signals. Our approach is based on a novel dataset that includes images taken from the perspective of a train operator and state-of-the-art object detectors. We test several conformal approaches and introduce a new method based on conformal risk control. Our findings demonstrate the potential of the conformal prediction framework to evaluate model performance and provide practical guidance for achieving formally guaranteed uncertainty bounds
Conformal Prediction for Trustworthy Detection of Railway Signals
We present an application of conformal prediction, a form of uncertainty quantification with guarantees, to the detection of railway signals. State-of-the-art architectures are tested and the most promising one undergoes the process of conformalization, where a correction is applied to the predicted bounding boxes (i.e. to their height and width) such that they comply with a predefined probability of success. We work with a novel exploratory dataset of images taken from the perspective of a train operator, as a first step to build and validate future trustworthy machine learning models for the detection of railway signals